Support tables with non-lowercase names in Druid by dheerajkulakarni · Pull Request #7197 · trinodb/trino

dheerajkulakarni · 2021-03-07T06:27:07Z

This PR addresses #6850

Changes in this PR

calling getRemoteTable & getRemoteSchema before filtering in getTableHandle(ConnectorSession session, SchemaTableName schemaTableName)
Added test cases for the same "TestCaseInSensitiveMapping"
Added a copyandIngestTpch function to copy any existing dataSource with a different dataSource name.

dheerajkulakarni · 2021-03-07T06:39:24Z

Hey @findepi @hashhar @Praveen2112, raised a draft PR as I need some inputs from you guys to go ahead,
this block of code which I highlighted in the image attached (io.trino.plugin.druid.BaseJdbcClient#toRemoteSchemaName)is failing DruidIntegrationSmokeTest cases, so what is happening is when case-insensitive-mapping is false which seems like a default behavior, this code of block is getting is executed, and I think by default storesUpperCaseIdentifiers() is giving true so SchemaName and TableName are getting returned in upper case letters because of which druid is not able to find the table or schema as they are originally in lowercase, before this remote function is not used to get called so there was no problem at all! so do you guys think this property has to be set explicitly somehow as false in test cases? or I observed TrinoMetaData class which has this property set as false by default, so isn't this supposed to return false by default?

hashhar · 2021-03-07T14:03:47Z

I observed TrinoMetaData class which has this property set as false by default, so isn't this supposed to return false by default?

You are almost correct but looking in the wrong place. The JdbcMetadata we are fetching using connection.getMetaData() is from the Druid jdbc client (because that's the driver underlying the connection).
So the correct place to check is what/why does Druid's JDBC driver (https://github.com/apache/calcite-avatica) return stores uppercase as true when it differs from it's actual behaviour.

dheerajkulakarni · 2021-03-07T15:14:18Z

Hey @hashhar, I thought TrinoDataMetaData class is kind of base for all the JdbcMetaData classes in all the drivers and was referring to it, yeah maybe I was wrong completely. If we agree that it has to return false by default, Instead of trying to set this property in test cases I will try to find why is this returning true instead of false, and maybe if possible in any way I will try to correct it.

hashhar · 2021-03-07T15:30:03Z

I think the driver is lying (because it uses a translation layer called Apache Calcite instead of being a true focused JDBC driver).

If we know the behaviour of Druid to be one of case preserving, upper case, lower case when storing then you can maybe add a copy of the toRemoteSchemaName and toRemoteTableName methods inside trino-druid for now and file an issue with the JDBC driver to fix JdbcMetadata for Druid.

This turned out to be a bit more complex than expected. Thanks for putting the effort into this @dheerajkulakarni.

hashhar · 2021-03-08T14:46:01Z

Looks like the Avatica driver has some connection properties that affect the JdbcMetadata. See https://calcite.apache.org/docs/adapter.html#jdbc-connect-string-parameters.

The default behavior is inherited from Oracle (see lex property). You might need to change DruidJdbcClientModule to set proper connection properties before opening a connection from the factory (see MySqlClientModule for example).

The relevant ones to set are lex OR a combination of caseSensitive=TRUE, quoting=DOUBLE_QUOTE, quotedCasing=UNCHANGED and unquotedCasing=TO_LOWER.

dheerajkulakarni · 2021-03-08T17:54:26Z

Thank you so much for the info @hashhar, I will go through this and will make the code changes accordingly!

dheerajkulakarni · 2021-03-10T17:06:55Z

@hashhar as discussed as there is no way to set these properties at JDBC Driver level, overridden toRemoteSchema and toRemoteTable functions in DruidJdbcClient

hashhar · 2021-03-10T17:09:20Z

Let's also create an issue with either Druid (or more correctly Apache Calcite since they build and ship the Avatica driver) to allow making the configuration properties be passed to Calcite so that JdbcMetadata isn't lying vs actual behaviour.

Thanks for the work, I will review this.

hashhar

Some changes requested.

In particular the test for testTableNameClash seems incorrect.

hashhar · 2021-03-07T14:06:09Z

Works until it breaks.
Can we read the indexTask as a JsonNode and replace the two nodes we are interested in and return the modified JsonNode back as a string?

hashhar · 2021-03-07T14:09:48Z

I would like if the method name was more clear but can't think of anything better.

We are basically taking an existing index task, changing the destination name and ingesting under the new name. Something like a CREATE TABLE AS SELECT * FROM file.

hashhar · 2021-03-11T06:03:35Z

 import static java.util.Objects.requireNonNull;

-final class RemoteTableNameCacheKey
+public final class RemoteTableNameCacheKey


Please create a GitHub issue here mentioning the reason why you did it this way and what needs to be fixed before we can do it the "correct" way.

Add a TODO comment here referring to that issue so that we can eventually clean up the code instead of changing API is non-needed ways.

Same TODO over the now public constructor.

hashhar · 2021-03-11T06:05:56Z

                if (tableHandles.isEmpty()) {
                    return Optional.empty();
                }
+


nit: revert whitespace changes.

hashhar · 2021-03-11T06:11:31Z

+            try (ResultSet resultSet = getTables(connection, Optional.of(remoteSchema), Optional.of(remoteTable))) {
                List<JdbcTableHandle> tableHandles = new ArrayList<>();
                while (resultSet.next()) {
                    tableHandles.add(new JdbcTableHandle(


Suggested change

tableHandles.add(new JdbcTableHandle(

tableHandles.add(new JdbcTableHandle(

schemaTableName,

getRemoteTable(resultSet));

private static RemoteTableName getRemoteTable(ResultSet resultSet)

{

return new RemoteTableName(

Optional.of(DRUID_CATALOG),

Optional.ofNullable(resultSet.getString("TABLE_SCHEM")),

resultSet.getString("TABLE_NAME"));

}

hashhar · 2021-03-11T06:19:10Z

+                .row("shippriority", "bigint", "", "") // Druid doesn't support int type
+                .row("totalprice", "double", "", "")
+                .build();
+        MaterializedResult actualColumns = computeActual("DESCRIBE " + "CamelCase");


No need for string concat here.

hashhar · 2021-03-11T06:19:18Z

+                .row("totalprice", "double", "", "")
+                .build();
+        MaterializedResult actualColumns = computeActual("DESCRIBE " + "CamelCase");
+        Assert.assertEquals(actualColumns, expectedColumns);


nit: static import.

hashhar · 2021-03-11T06:20:25Z

+            copyAndIngestTpchData(getQueryRunner().execute(SELECT_FROM_REGION + " LIMIT 10"), this.druidServer, "region", "camelcase");
+        }
+        catch (AssertionError e) {
+            Assert.assertEquals(e.getMessage(), "Datasource camelcase not loaded expected [true] but found [false]");


Can you share the entire stack trace?

This exception itself can also happen if something fails on Druid end rather than our side.

hashhar · 2021-03-11T06:20:57Z

+            throws IOException, InterruptedException
+    {
+        try {
+            //ingesting data with already existing table name in lowercase which should fail


I'd expect ingestion to work since Druid is case-sensitive. But querying such tables should fail from Trino.

hashhar · 2021-03-11T06:24:16Z

+        MaterializedResult materializedRows = computeActual("SELECT * FROM druid.druid.CAMELCASE");
+        Assert.assertEquals(materializedRows.getRowCount(), 10);
+        MaterializedResult materializedRows1 = computeActual("SELECT * FROM druid.CamelCase");
+        MaterializedResult materializedRows2 = computeActual("SELECT * FROM druid.camelcase");


Can be simplified using assertQuery("SELECT COUNT(1) FROM druid.druid.camelcase", "VALUES 10")

bitsondatadev · 2022-10-21T21:24:53Z

👋 @dheerajkulakarni - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

cla-bot Bot added the cla-signed label Mar 7, 2021

hashhar self-requested a review March 7, 2021 14:03

findepi changed the title ~~added support for querying tables with upper case character names~~ Support tables with non-lowercase names in Druid Mar 8, 2021

added support tables with non-lowercase names in Druid

d2ec83e

dheerajkulakarni force-pushed the trino-6850 branch from 52b9f5f to d2ec83e Compare March 10, 2021 16:03

dheerajkulakarni marked this pull request as ready for review March 10, 2021 16:06

hashhar requested changes Mar 11, 2021

View reviewed changes

hashhar mentioned this pull request Mar 11, 2021

Looking for Snowflake & Druid Connector #7247

Closed

dheerajkulakarni mentioned this pull request Mar 15, 2021

Update Druid JDBC driver to remove code duplication from DruidClient #7287

Open

findepi force-pushed the master branch from 8538e49 to 1f896ea Compare July 30, 2021 22:13

ebyhr assigned dheerajkulakarni Sep 28, 2021

ebyhr mentioned this pull request Mar 30, 2022

trino-druid plugin: Unable to query tables having any upper case characters #6850

Closed

bitsondatadev closed this Oct 21, 2022

anusudarsan mentioned this pull request May 17, 2023

Support tables with non-lowercase names in Druid #17545

Merged

-                    tableHandles.add(new JdbcTableHandle(
+                    tableHandles.add(new JdbcTableHandle(
+                            schemaTableName,
+                            getRemoteTable(resultSet));
+    private static RemoteTableName getRemoteTable(ResultSet resultSet)
+    {
+         return new RemoteTableName(
+                Optional.of(DRUID_CATALOG),
+                Optional.ofNullable(resultSet.getString("TABLE_SCHEM")),
+                resultSet.getString("TABLE_NAME"));
+    }

Conversation

dheerajkulakarni commented Mar 7, 2021

Uh oh!

dheerajkulakarni commented Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashhar commented Mar 7, 2021

Uh oh!

dheerajkulakarni commented Mar 7, 2021

Uh oh!

hashhar commented Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashhar commented Mar 8, 2021

Uh oh!

dheerajkulakarni commented Mar 8, 2021

Uh oh!

dheerajkulakarni commented Mar 10, 2021

Uh oh!

hashhar commented Mar 10, 2021

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bitsondatadev commented Oct 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

dheerajkulakarni commented Mar 7, 2021 •

edited

Loading

hashhar commented Mar 7, 2021 •

edited

Loading